---
author: Stéphane Laurent
date: '2017-06-06'
highlighter: kate
output:
html_document:
keep_md: False
md_document:
toc: True
variant: markdown
prettify: True
prettifycss: 'twitter-bootstrap'
tags: 'maths, statistics'
title: 'The geometry of the balanced ANOVA model (with fixed effects)'
---
- [Standard normal distribution on a vector
space](#standard-normal-distribution-on-a-vector-space)
- [The balanced ANOVA model](#the-balanced-anova-model)
- [Tensor product](#tensor-product)
- [Least-squares estimates](#least-squares-estimates)
Most usually, the mathematical treatment of Gaussian linear models
starts with the matricial writing $Y=X\beta+\sigma G$, where $Y$ is a
random vector modelling the $n$ response values, $X$ is a known matrix,
$\beta$ is the vector of unknown parameters, and $G$ has the standard
normal distribution on $\mathbb{R}^n$.
There are good reasons to use this matricial writing. However it is
cleaner to treat the theory with the equivalent vector space notation
$Y = \mu + \sigma G$, where $\mu$ is assumed to lie in a linear subspace
$W$ of $\mathbb{R}^n$, corresponding to $\text{Im}(X)$ in the matricial
notation. For example, denoting by $P_W$ the orthogonal projection on
$W$, the least-squares estimate $\hat\mu$ of $\mu$ is simply given by
$\hat\mu=P_Wy$, and $P_W^\perp y$ is the vector of residuals, denoting
by $P^\perp_W$ the projection on the orthogonal complement of $W$. Thus
there is no need to consider $W=\text{Im}(X)$ to derive the general
principles of the theory. The balanced one-way ANOVA model, which is the
topic of this article, illustrates this approach.
Standard normal distribution on a vector space
----------------------------------------------
The main tool used to treat the theory of Gaussian linear models is the
standard normal distribution on a linear space.
[Theorem and definition]{#boxtitle style="color:blue;font-size:30px;"}
Let $X$ be a $\mathbb{R}^n$-valued random vector, and
$W \subset \mathbb{R}^n$ be a linear space. Say that $X$ has the
standard normal distribution on the vector space $W$, and then note
$X \sim SN(W)$, if it takes its values in $W$ and its characteristic
function is given by
$$\mathbb{E} \textrm{e}^{i\langle w, X \rangle} = \textrm{e}^{-\frac12{\Vert w \Vert}^2} \quad \text{for all } w \in W.$$
The three following assertions are equivalent (and this is easy to
prove):
1. $X \sim SN(W)$;
2. the coordinates of $X$ in some
orthonormal basis of $W$ are i.i.d. standard normal random variables;
3. the coordinates of $X$ in any orthonormal basis of $W$ are
i.i.d. standard normal random variables.
Of course we retrieve the standard normal distribution on $\mathbb{R}^n$
when taking $W=\mathbb{R}^n$.
From this definition-theorem, the so-called *Cochran's theorem* is an
obvious statement. More precisely, if $U \subset W$ is a linear space,
and $Z=U^\perp \cap W$ is the orthogonal complement of $U$ in $W$, then
the projection $P_UX$ of $X$ on $U$ has the standard normal distribution
on $U$, similarly the projection $P_ZX$ of $X$ on $Z$ has the standard
normal distribution on $Z$, and moreover $P_UX$ and $P_ZX$ are
independent. This is straightforward to see from the definition-theorem
of $SN(W)$, and it is also easy to see that
${\Vert P_UX\Vert}^2 \sim \chi^2_{\dim(U)}$.
The balanced ANOVA model
------------------------
The balanced ANOVA model is used to model a sample $y=(y_{ij})$ with a
tabular structure: $$
y=\begin{pmatrix}
y_{11} & \ldots & y_{1J} \\
\vdots & y_{ij} & \vdots \\
y_{I1} & \ldots & y_{IJ}
\end{pmatrix},
$$ $y_{ij}$ denoting the $j$-th measurement in group $i$. It is assumed
that the $y_{ij}$ are independent and the population mean depends on the
group index $i$. More precisely, the $y_{ij}$ are modelled by random
variables $Y_{ij} \sim_{\text{iid}} {\cal N}(\mu_i, \sigma^2)$.
So, how to write this model as $Y=\mu + \sigma G$ where
$G \sim SN(\mathbb{R}^n)$ and $\mu$ lies in a linear space
$W \subset \mathbb{R}^n$ ?
Tensor product
--------------
Here $n=IJ$ and one could consider $Y$ as the vector obtained by
stacking the $Y_{ij}$. For example if $I=2$ and $J=3$, we should write
$$Y={(Y_{11}, Y_{12}, Y_{13}, Y_{21}, Y_{22}, Y_{23})}'.$$
Actually this is not a good idea to loose the tabular structure. The
appropriate approach for writing the balanced ANOVA model involves the
*tensor product*. We keep the tabular structure of the data:
$$Y = \begin{pmatrix}
Y_{11} & Y_{12} & Y_{13} \\
Y_{21} & Y_{22} & Y_{23}
\end{pmatrix}$$ and we take
$$G \sim SN(\mathbb{R}^I\otimes\mathbb{R}^J)$$ where the *tensor poduct*
$\mathbb{R}^I\otimes\mathbb{R}^J$ of $\mathbb{R}^I$ and $\mathbb{R}^J$
is nothing but the space of matrices with $I$ rows and $J$ columns. Here
$$
\mu = \begin{pmatrix}
\mu_1 & \mu_1 & \mu_1 \\
\mu_2 & \mu_2 & \mu_2
\end{pmatrix},
$$ lies, as we will see, in a linear space
$W \subset \mathbb{R}^I\otimes\mathbb{R}^J$ which is convenient to
define with the help of the *tensor product* $x \otimes y$ of two
vectors $x \in \mathbb{R}^I$ and $y \in \mathbb{R}^J$, defined as the
element of $\mathbb{R}^I\otimes\mathbb{R}^J$ given by $$
{(x \otimes y)}_{ij}=x_iy_j.
$$ Not all vectors of $\mathbb{R}^I\otimes\mathbb{R}^J$ can be written
$x \otimes y$, but the vectors $x \otimes y$ span
$\mathbb{R}^I\otimes\mathbb{R}^J$. In consistence with this notation,
the tensor product $U \otimes V$ of two vector spaces
$U \subset \mathbb{R}^I$ and $V \subset \mathbb{R}^J$ is defined as the
vector space spanned by the vectors of the form $x \otimes y$,
$x \in U$, $y \in V$.
Then\
$$
\mu = (\mu_1, \mu_2) \otimes (1,1,1),
$$ and $\mu$ is assumed to lie in the linear space $$
W = \mathbb{R}^I \otimes [{\bf 1}_J].
$$
Moreover, there is a nice orthogonal decomposition of $W$ corresponding
to the usual other parameterization of the model:
$$\boxed{\mu_i = m + \alpha_i} \quad \text{with } \sum_{i=1}^I\alpha_i=0.$$
Indeed, writing $\mathbb{R}^I=[{\bf 1}_I] \oplus {[{\bf 1}_I]}^\perp$
yields the following decomposition of $\mu$: $$
\begin{align*}
\mu = (\mu_1, \ldots, \mu_I) \otimes {\bf 1}_J & =
\begin{pmatrix}
m & m & m \\
m & m & m
\end{pmatrix} +
\begin{pmatrix}
\alpha_1 & \alpha_1 & \alpha_1 \\
\alpha_2 & \alpha_2 & \alpha_2
\end{pmatrix} \\
& = \underset{\in \bigl([{\bf 1}_I]\otimes[{\bf 1}_J]\bigr)}{\underbrace{m({\bf 1}_I\otimes{\bf 1}_J)}} + \underset{\in \bigl([{\bf 1}_I]^{\perp}\otimes[{\bf 1}_J] \bigr)}{\underbrace{(\alpha_1,\ldots,\alpha_I)\otimes{\bf 1}_J}}
\end{align*}
$$
Least-squares estimates
-----------------------
With the theory introduced above, the least-squares estimates of $m$ and
the $\alpha_i$ are given by $\hat m({\bf 1}_I\otimes{\bf 1}_J) = P_U y$
and $\hat\alpha\otimes{\bf 1}_J = P_Zy$ where
$U = [{\bf 1}_I]\otimes[{\bf 1}_J]$ and
$Z = {[{\bf 1}_I]}^{\perp}\otimes[{\bf 1}_J] = U^\perp \cap W$, and we
also know that $\hat m$ and the $\hat\alpha_i$ are independent. The
least-squares estimates of the $\mu_i$ are given by
$\hat\mu_i=\hat m +\hat\alpha_i$. Deriving the expression of these
estimates and their distribution is left as an exercise to the reader.
As another exercise, check that $$
{\Vert P_Z \mu \Vert}^2 = J \sum_{i=1}^I {(\mu_i - \bar\mu_\bullet)}^2 =
J \sum_{i=1}^I \alpha_i^2.
$$